Motivation and Objectives

In September, 2014, the Federal Bureau of Investigation (FBI) published a report, “A Study of Active Shooter Incidents in the United States Between 2000 and 2013,” starting that the incidence of school and mass shooting incidents has been drastically increasing since the turn of the century \(^{[1,2]}\). According to a recent study, on an average, the United States experiences approximately one mass killing every two weeks and one school shooting every month \(^{[4]}\). In fact, just on November 14th, 2019, there was a shooting at a high school in Santa Clarita, California where a 16-year-old opened fire at the school, killing 2 people and injuring 3 others \(^{[3]}\). Such an increasing trend has raised serious public concerns about school safety and questions the efficacy of the existing gun laws in different states. To prevent such traumatic incidents, state legislators should consider implementing effective gun policies that would bring notable reduction in school shootings. We believe that an in-depth analysis of the impact of gun policies of every state on school shooting may help convince state legislators to push to establish appropriate interventions. Thus, the primary aim of this study is to investigate the association between the strength of state gun laws with the incidence of school shootings.

Hypothesis Questions

School shooting is a complex phenomenon as it is connected with various crucial components such as: mental health, bullying, school safety measures, socioeconomic condition and most importantly availability of guns. The goal of the project was to investigate the impact of gun policies on the school scooting incidences for different states. The analysis was designed to provide valuable insights to the state legislators about the impact of gun policies on the school shooting incidents. In order to analyze school shooting in depth, we scrutinized the following aspects of School Shooting within each state over the time period:

  1. Overall Impacts of the gun laws
  2. Affects of bullying laws and prevalence of bullying
  3. Prevalence of mental illness
  4. Prevalence of Bullying
  5. School Safety Measures
  6. Affects of Media coverage
  7. Impact of hostility of school environment
  8. Impact of poverty prevalence rate
  9. Impact of specific gun laws

Data

School Shootings

The dataset was obtained from the Center for Homeland Defense and Security website. It contained details about the school shootings incidents that took place in the United States over the years 1970 to 2019. The data were gather from different sources such as: newspaper article, online news report, court records or police report, and hundreds of news sources, statement/interview from law enforcement official, news report on court ruling. Each row of the dataset indicated an incident that occur at a specific date and time. The dataset contained some insightful variables such as:

  • Location of incident
  • Age/gender/race of shooter
  • Number of victims
  • Number of wounded
  • Age of victim(s)
  • Time of day/day of week
  • Motive/category
  • History of bullying (of shooter)
  • History of domestic violence
  • Weapon type
  • etc.

The dataset also contained a variable named Reliability Score which indicated the of sources of about the incidents. Reliability was scored on a 5-point scale classified by where the data were obtained from with the following scoring assignment:

  1. Independent Single Author/Moderator Blog, report/list lacking citations, or cited source cannot be located (e.g., newspaper headline title and story does not appear in searches)
  2. Single Newspaper Article, Online News Report (Network, Cable, Online)
  3. Multiple News Sources
  4. Hundreds of News Sources, Statement/Interview from Law Enforcement Official, News report on court ruling
  5. Court Records or Police Report

We determined that observations with a reliability score of 1 may not be accurate, and hence discarded these observations from further analysis. The dataset also had a variable indicating the categories of the shooting incidences. The category marked as “Accidental” was also removed from further analysis. As the study focuses on the number of incidences in each State, the total number of incidents that occurred in each State per year was calculated.

# variable to indicate if raw data should be imported or (if false) cleaned csv should be used
import_raw_data = FALSE

if(import_raw_data){
# read in data from local csv file
ss = read_csv("../../data_1/school_shootings.csv")

# first row has column names
colnames(ss) = ss[1,]

ss = ss[-1,]%>%
  clean_names()%>%
  select(-c(na,na_2))

# columns after row 1426 were shifter in original table
# read in second data frame to fix this
ss2 = read_csv("../../data_1/ss_pt2.csv")%>%
  select(-X1)
colnames(ss2)=colnames(ss)[34:ncol(ss)]

# update tail of dataframe
ss[(1426:nrow(ss)),(34:ncol(ss))]=ss2

# fix age class
ss$shooter_age = as.numeric(ss$shooter_age)

# create cleaned csv file
write.csv(ss,"ss_data.csv")
}

# ss is csv file with school shooting data (individual incidents) created above
ss = read_csv("ss_data.csv")%>%
  select(-X1)%>%
  mutate(year = as.numeric(substr(date, (nchar(date)-3),nchar(date))))%>%
  mutate(state = ifelse(state == "St. Croix, US Virgin Islands", 
                        "St.Croix",state))%>% #shorten name
  distinct_at(.vars = c(1:24), .keep_all = T)%>%
  filter(category != "Accident")%>% # remove accidents from dataset
  filter(reliability_score_1_5!=1) # remove unreliable data 
sts = unique(ss$state)
yrs = seq(range(ss$year)[1],range(ss$year)[2],by=1)

st_yr = data.frame(state = rep(sts,length(yrs)), year = sort(rep(yrs,length(sts))))%>%
  arrange(state)

State Grades (Gun Policy)

The main purpose of this analysis was to quantify the effect of gun policies on the school shooting incidences for each state. We found a dataset that contains annual gun law score for each state over the years 2010 to 2018. The dataset was obtained from the Gifford Law Center. For each state the annual score was graded on an A to F scale with higher grade states having more restrictive gun policies. As we also wish to explore the effect of changes in the score, it might be more convenient to convert the grades into GPA scale. So the grades for each state was transformed into a GPA score using the school GPA grading scale (out of 4.00).

# read in state grade data
if(import_raw_data){
# on csv file per year

# 2018
grades18 = read_csv("../../data_1/2018grades.csv")%>%
  select(-c(X6,X7))%>%
  clean_names()%>%
  select(-gun_death_rate_per_100k)%>%
  mutate(year = "2018")%>%
  rename(grade = x2018_grade)%>%
  select(state,grade,year)

# 2017
grades17 = read_csv("../../data_1/2017grades.csv")%>%
  clean_names()%>%
  mutate(year = "2017")%>%
  rename(grade = x2017_grade)%>%
  select(state,grade,year)

# 2016
grades16 = read_csv("../../data_1/2016grades.csv")%>%
  clean_names()%>%
  mutate(year = "2016")%>%
  rename(grade = x2016_grade)%>%
  select(state,grade,year)

# 2015
grades15 = read_csv("../../data_1/2015grades.csv")%>%
  clean_names()%>%
  mutate(year = "2015")%>%
  rename(grade = x2015_grade)%>%
  select(state,grade,year)
# add in missing value
grades15 = rbind(grades15,c("Kansas","F",2015))

# 2014
grades14 = read_csv("../../data_1/2014grades.csv")%>%
  clean_names()%>%
  mutate(year = "2014")%>%
  rename(grade = x2014_grade)%>%
  select(state,grade,year)

# 2013
grades13 = read_csv("../../data_1/2013grades.csv")%>%
  clean_names()%>%
  mutate(year = "2013")%>%
  rename(grade = x2013_grade)%>%
  select(state,grade,year)

# 2012
grades12 = read_csv("../../data_1/2012grades.csv")%>%
  clean_names()%>%
  mutate(year = "2012")%>%
  rename(grade = x2012_grade)%>%
  select(state,grade,year)

# combine all year data 
grades = bind_rows(grades18,grades17)%>%
  bind_rows(grades16)%>%
  bind_rows(grades15)%>%
  bind_rows(grades14)%>%
  bind_rows(grades13)%>%
  bind_rows(grades12)

# write cleaned csv file
write.csv(grades,"state_grades.csv")
}



grades = read_csv("state_grades.csv")%>%
  select(-X1)

gpa_convert = data.frame(letter_grade = c("A","A-","B+","B","B-",
                                          "C+","C","C-","D+","D","D-","F"),
                         gpa = c(4,3.7,3.3,3,2.7,2.3,2,1.7,1.3,1,0.7,0))

Gun Laws

The analysis also wished to investigate the impact of different gun laws separately and thus considered the RAND dataset. The dataset was obtained from the ______________ website. The dataset includes information about the changes in laws regarding firearms for each state since the year 1979. The dataset contains variables which informs about the type of the law, whether the change was restrictive or permissive, effective date and end date. There are total 12 types of laws and each change was designated as either restrictive or permissive. As some of the laws change in the middle of a year, the year was rounded to the nearest year (cutoff point of July 1st). Each permissive law was assigned the value -1 and each restrictive law was assigned as +1. Then, for each category and for each state the total number of permissive and restrictive laws were measured and a total score was calculated. The different types of laws were further merged and categorized in 4 categories into the following manner:

  1. Access: Minimum age, Prohibited Possessor, Child Access
  2. Screening: Waiting period, Registration, Background checks, Dealer license, Safety training, permit, one gun per month
  3. Ban: Firearm ban
  4. Transport: Castle Doctrine, Open carrying not restricted, Carrying a concealed weapon
# Do not execute because final product is already saved as a CSV


if(FALSE){
  # import the raw data
data.rand=read.csv("rand.laws.database.csv",stringsAsFactors = FALSE)

# combine law types by a simple set of phrases which all laws contain exactly one of 

types=c("background check","carrying","castle doctrine","child access","dealer license","age","registration","waiting period","prohibited possessor","safety training","ban","permit","open carry","one gun per")
match.df=as.data.frame(unique(data.rand$Type.of.Law))
match.df <- data.frame(lapply(match.df,as.character),stringsAsFactors=FALSE)
names(match.df)<-"type"
match.df$abv.name=0

i=1
j=1
flag=0
while(i<=nrow(match.df)){
      j=1
      while(j<=length(types)&flag==0){
            if(any(grep(types[j],match.df$type[i],ignore.case=TRUE))){
                  match.df$abv.name[i]=types[j]
                  flag=1
                  }
            else{
                  j=j+1
            }
      }
      flag=0
      i=i+1
}

names(data.rand)[5]="type"
data.rand=left_join(data.rand,match.df,by="type")
names(data.rand)[23]<-"type.of.law"

# Only keep laws which have non-missing values for the type of change and effect, and remove DC

data.rand=filter(data.rand,Type.of.Change=="Implement"|Type.of.Change=="Modify"|Type.of.Change=="Repeal")
data.rand=filter(data.rand,Effect!="")
data.rand=filter(data.rand,State!="District of Columbia"&State!="DIstrict of Columbia")

# Determine the years in which the law is in effect by rounding the start and end dates to the nearest year. For laws which are not superceded by newer laws, set the end date to 2020 (our analyses do not extend to 2020 so this is equivalent to saying the law never ended).

data.rand$Supercession.Date[data.rand$Supercession.Date==""]="1/1/2020"
data.rand$end.year=year(mdy(data.rand$Supercession.Date))
data.rand$end.month=month(mdy(data.rand$Supercession.Date))
data.rand$start.year=year(mdy(data.rand$Effective.Date))
data.rand$start.month=month(mdy(data.rand$Effective.Date))
data.rand$start.year=data.rand$start.year+ceiling(data.rand$start.month/6)-1
data.rand$end.year=data.rand$end.year+ceiling(data.rand$end.month/6)-1
data.rand=clean_names(data.rand)

year=2007
blank.df=NULL
while(year<=2019){
      blank.df=data.rand%>%filter(start_year<=year,end_year>year)%>%count(state,type_of_law,effect)%>%cbind(year)%>%rbind(blank.df)
      year=year+1
}
skel=expand.grid(unique(data.rand$state),unique(data.rand$type_of_law),unique(data.rand$effect),2007:2019)
skel=as.data.frame(skel)

# count the number of laws in effect for each state by year by type of law and effect (permissive or restrictive)

names(skel)<-c("state","type_of_law","effect","year")
skel$state=as.character(skel$state)
rand.count.data=merge(skel,blank.df,by=c("state","type_of_law","effect","year"),all=TRUE)
rand.count.data[is.na(rand.count.data$n),"n"]=0
names(rand.count.data)[5]<-"count"
rand.count.data=unite(rand.count.data,"law_effect",c("type_of_law","effect"),sep="_")%>%spread("law_effect","count")%>%clean_names()
write.csv(rand.count.data,"cleaned.rand.csv")
}





# data set to go from state names to abbreviations
states = data.frame(state.name,state.abb)
dc = data.frame(matrix(ncol= 2, nrow = 1))
colnames(dc)=colnames(states)
dc$`state.name` = "District of Columbia"
dc$`state.abb` = "DC"

states = rbind(states,dc)
states$state.name = as.character(states$state.name)


rand.count.data = read_csv("cleaned.rand.csv")%>%
  select(-X1)%>%
  left_join(states, by = c("state"= "state.name"))%>%
  select(-state)%>%
  rename(state = state.abb)

Mental Health

The literature review shows strong influence of mental illness on school shooting incidences. Thus we collected a dataset that contains the prevalence rate of youth with severe mental disorder. The dataset was obtained from_____________. The dataset contained the percentage of youths who has been diagnosed with major depression but who did not receive any mental health treatment for each state over the years 2010 to 2016.

# cleaning mental health data. commented out because it is already saved as a csv file

if(FALSE){
# read in each year's csv file
data.2016=read.csv("mha2016.csv",stringsAsFactors = FALSE)
data.2017=read.csv("mha2017.csv",stringsAsFactors = FALSE)
data.2018=read.csv("mha2018.csv",stringsAsFactors = FALSE)
data.2019=read.csv("mha2019.csv",stringsAsFactors = FALSE)
data.2020=read.csv("mha2020.csv",stringsAsFactors = FALSE)

names(data.2017)[2]<-"State"

data.2016=data.2016[,c("State","Percent","Year")]
data.2017=data.2017[,c("State","Percent","Year")]
data.2018=data.2018[,c("State","Percent","Year")]
data.2019=data.2019[,c("State","Percent","Year")]
data.2020=data.2020[,c("State","Percent","Year")]
states=sort(unique(data.2020$State))
# Remove DC
states=states[-9]
# Find which states don't match up for each year

states==sort(unique(data.2016$State))
sort(unique(data.2016$State))[8]
# remove DC from 2016 data
data.2016=filter(data.2016,State!="DC")      
states==sort(unique(data.2016$State))
data.2016$State[35]="Louisiana"
states==sort(unique(data.2016$State))
data.2017=filter(data.2017,State!="District of Columbia")
states==sort(unique(data.2017$State))
data.2018=filter(data.2018,State!="District of Columbia")
data.2018$State[9]="Wyoming"
states==sort(unique(data.2018$State))
data.2019=filter(data.2019,State!="District of Columbia"&State!="")
states==sort(unique(data.2019$State))

mh.data=rbind(data.2016,data.2017,data.2018,data.2019,data.2020)
mh.data$Year=mh.data$Year-1      
mh.data$Percent[51]=42.1
mh.data=filter(mh.data,State!="District of Columbia")
write.csv(mh.data,"mh.data.csv")
}





mh.data=read.csv("mh.data.csv",stringsAsFactors = FALSE)%>%
  clean_names()%>%
  select(-x)%>%
  left_join(states, by = c("state"="state.name"))%>%
  select(-state)%>%
  rename(state = state.abb)

CDC Survey

The literature reviews stipulates that mental health, school satiety and bullying all play a vital roles in school shooting incidences. To analyze the dynamics of school shootings, we gathered data from a CDC survey given to students in 37 states biannually from 1991 to 2015. The data were obtained from the R package yrbss. The dataset includes proportions of students answering “yes” to a number of questions. We selected and included in the analysis the following questions:

  1. Carried a weapon
  2. Carried a gun
  3. Carried a weapon on school property
  4. Did not go to school because they felt unsafe at school or on their way to or from school
  5. Were threatened or injured with a weapon on school property
  6. Were in a physical fight
  7. Were injured in a physical fight
  8. Were in a physical fight on school property
  9. Were bullied on school property
  10. Were electronically bullied
  11. Felt sad or hopeless
  12. Seriously considered attempting suicide
  13. Made a plan about how they would attempt suicide
  14. Attempted suicide
  15. Attempted suicide that resulted in an injury, poisoning, or overdose that had to be treated by a doctor or nurse
  16. Been teased because of weight in the past 12 months
  17. Been teased because of being labeled GLB in the past 12 months

Because many of these questions cover similar topics, we chose to combine the variables into four main categories and took weighted averages (weight by the number of respondents) to obtain a final score for each category. The categories are classified as follows:

  1. Suicidal thoughts or actions (measure of mental health):
    • Felt sad or hopeless, Seriously considered attempting suicide
    • Made a plan about how they would attempt suicide
    • Attempted suicide
    • Attempted suicide that resulted in an injure, poisoning, or overdose.
  2. School safety :
    • Carried a weapon
    • Carried a gun
    • Carried a weapon on school property
    • Did not go to school because they felt unsafe at school or on their way to or from school
    • Were threatened or injured with a weapon on school property.
  3. Involvement in physical altercations:
    • Were in a physical fight
    • Were injured in a physical fight
    • Were in a physical fight on school property
  4. Bullying:
    • Were bullied on school property
    • Were electronically bullied
    • Been teased b/c of weight past 12 months
    • Been teased b/c labeled GLB past 12 months

Data for questions regarding bullying were not present in the dataset prior to 2010. We assume the surveys were updated to include measures of bullying after the initial writing of the survey. Thus we only included the data from the year 2010. The dataset also contains data for every other year. So for the interpolation, we imputed the missing observation with the mean of the previous and next year values. As the dataset contains data up to the year 2016 and our school shooting dataset contained data till 2018, we imputed the values for those years using the predicted values of the linear regression model. To get an overall score for each state over all the years, we calculated the median score for each state. Though the mean and the median score did not appear to be much different, the mean may be affected by extreme values.

# questions 
qns = c("qn13","qn14","qn15","qn16","qn17","qn18","qn19","qn20","qn24","qn25","qn26","qn27","qn28","qn29","qn30","qnbullyweight","qnbullygay")
# paricitpating states
sts = yrbss::getListOfParticipatingStates()
# years in data
yrs = c(1991,1993,1995,1997,1999,2001,2003,2005,2007,2009,2011,2013,2015)

if(import_raw_data){
  # make data frame to store values
bullying = data.frame(variable = NULL, state = NULL, year = NULL, prop= NULL, ciLB= NULL, ciUB = NULL, n = NULL)

# run over all questions, states, and years
for(v in qns){
  for(s in sts){
    for(y in yrs ){
      p = yrbss::getProportionSingleVariable(.variable = v, .location = s, .year = y, .level = 0.95)$prop
    ciL = yrbss::getProportionSingleVariable(.variable = v, .location = s, .year = y, .level = 0.95)$ciLB
    ciU = yrbss::getProportionSingleVariable(.variable = v, .location = s, .year = y, .level = 0.95)$ciUB
    sample_size = yrbss::getProportionSingleVariable(.variable = v, .location = s, .year = y, .level = 0.95)$n
    newrow = data.frame(variable = v, state = s, year = y, prop= p, ciLB= ciL, ciUB = ciU, n = sample_size)
    bullying = bind_rows(bullying,newrow)}
}}

# incorporate labels for questions (what the question actually is)
lab = data.frame(question = yrbss::yrbss_questions_binary$variable, label = yrbss::yrbss_questions_binary$label)%>%
   filter(question %in% qns)
 
bullying = left_join(bullying,lab, by = c(variable = "question"))
# clean Arizona label
bullying$state[which(bullying$state=="AZB")]="AZ"

# write cleaned csv file
write_csv(bullying, "bullying_survey.csv")
}

# incorporate labels for questions (what the question actually is)
lab = data.frame(question = yrbss::yrbss_questions_binary$variable, label = yrbss::yrbss_questions_binary$label)%>%
   filter(question %in% qns)

# read in csv
bullying_survey = read_csv("bullying_survey.csv")


# Make combined variables for survey questions (specified in text above)

# combine suicide questions 
suicide = data.frame(year = NULL, state = NULL, score = NULL, n = NULL)
suicide_qns = bullying_survey%>%
  filter(variable %in% c("qn26","qn27","qn28","qn29","qn30"))

sts[which(sts=="AZB")]="AZ"
for(y in yrs){
  for(st in sts){
    dat = filter(suicide_qns,(year ==y & state == st))
    # score is weighted average of proportions (weighted by number of respondents)
    suicide_score = weighted.mean(dat$prop,(dat$n/sum(dat$n)), na.rm = TRUE)
    samp = mean(dat$n,na.rm = TRUE)
    newrow = data.frame(year = y, state = st, score = suicide_score, n = samp)
    suicide = bind_rows(suicide,newrow)
  }
}

# combine questions involving physical fights
fight = data.frame(year = NULL, state = NULL, score = NULL, n = NULL)
fight_qns = bullying_survey%>%
  filter(variable %in% c("qn18","qn19","qn20"))

for(y in yrs){
  for(st in sts){
    dat = filter(fight_qns,(year ==y & state == st))
    # score is weighted average of proportions (weighted by number of respondents)
    fight_score = weighted.mean(dat$prop,(dat$n/sum(dat$n)),, na.rm = TRUE)
    samp = mean(dat$n,na.rm = TRUE)
    newrow = data.frame(year = y, state = st, score = fight_score, n = samp)
    fight = bind_rows(fight,newrow)
  }
}

# school safety questions
safety = data.frame(year = NULL, state = NULL, score = NULL, n = NULL)
safety_qns = bullying_survey%>%
  filter(variable %in% c("qn13","qn14","qn15","qn16","qn17"))

for(y in yrs){
  for(st in sts){
    dat = filter(safety_qns,(year ==y & state == st))
    # score is weighted average of proportions (weighted by number of respondents)
    safety_score = weighted.mean(dat$prop,(dat$n/sum(dat$n)), na.rm = TRUE)
    samp = mean(dat$n,na.rm = TRUE)
    newrow = data.frame(year = y, state = st, score = safety_score, n = samp)
    safety = bind_rows(safety,newrow)
  }
}

# bullying questions
bull = data.frame(year = NULL, state = NULL, score = NULL, n = NULL)
bully_qns = bullying_survey%>%
  filter(variable %in% c("qn24","qn25","qnbullyweight","qnbullygay"))

for(y in yrs){
  for(st in sts){
    dat = filter(bully_qns,(year ==y & state == st))
    # score is weighted average of proportions (weighted by number of respondents)
    bully_score = weighted.mean(dat$prop,(dat$n/sum(dat$n)), na.rm = TRUE)
    samp = mean(dat$n,na.rm = TRUE)
    newrow = data.frame(year = y, state = st, score = bully_score, n = samp)
    bull = bind_rows(bull,newrow)
  }
}



# create table with new variables
survey_combined = bind_cols(suicide,safety, fight, bull)%>%
  select(year, state, score, n, score1, n1, score2, n2, score3, n3)%>%
  rename(suicide_score = score, school_safety_score = score1, fight_score = score2, bully_score = score3)%>%
  filter(year>=2007)%>% 
  full_join(st_yr%>%filter(year>=2007))%>%
  arrange(year)%>%
  arrange(state)%>%
  mutate(state = ifelse(state == "AZB","AZ",state))


# impute values
# average value of previous and next year (for years with such data)
# for 2016-2019 predict value from line between 2014 and 2015 points
survey_combined = survey_combined%>%
  group_by(state)%>%
  mutate(suicide_score =ifelse(year<2016,ifelse(is.na(suicide_score), na.locf(suicide_score)/2+na.locf(suicide_score, fromLast = TRUE)/2,suicide_score),
                              NA ))%>%
  mutate(school_safety_score =ifelse(year<2016,ifelse(is.na(school_safety_score), na.locf(school_safety_score)/2+na.locf(school_safety_score, fromLast = TRUE)/2,school_safety_score),NA))%>%
    mutate(fight_score =ifelse(year<2016,ifelse(is.na(fight_score), na.locf(fight_score)/2+na.locf(fight_score, fromLast = TRUE)/2,fight_score),NA))%>%
      mutate(bully_score =ifelse(year<2016,ifelse(is.na(bully_score), na.locf(bully_score)/2+na.locf(bully_score, fromLast = TRUE)/2,bully_score),NA))%>%
        mutate(n=ifelse(year<2016,ifelse(is.na(n), na.locf(n)/2+na.locf(n, fromLast = TRUE)/2,n),NA))%>%
          mutate(n1=ifelse(year<2016,ifelse(is.na(n1), na.locf(n1)/2+na.locf(n1, fromLast = TRUE)/2,n1),NA))%>%
          mutate(n2=ifelse(year<2016,ifelse(is.na(n2), na.locf(n2)/2+na.locf(n2, fromLast = TRUE)/2,n2),NA))%>%       
  mutate(n3=ifelse(year<2016,ifelse(is.na(n3), na.locf(n3)/2+na.locf(n3, fromLast = TRUE)/2,n3),NA))


# use fitted line to get 2016-2019 values 
for(s in sts[which(!(sts %in% c("MA","AZB","DC")))]){
  dat = filter(survey_combined, state ==s)
  ind = which(is.na(dat$suicide_score))[1]
  slp = dat[(ind-1),3:10]-dat[(ind-2),3:10]
  dat[which(is.na(dat$suicide_score))[1],3:10] =  dat[(ind-1),3:10]+as.numeric(dat[which(is.na(dat$suicide_score))[1],1]-2015)*slp
    dat[which(is.na(dat$suicide_score))[1],3:10] =  dat[(ind-1),3:10]+as.numeric(dat[which(is.na(dat$suicide_score))[1],1]-2015)*slp  
    dat[which(is.na(dat$suicide_score))[1],3:10] =  dat[(ind-1),3:10]+as.numeric(dat[which(is.na(dat$suicide_score))[1],1]-2015)*slp  
    dat[which(is.na(dat$suicide_score))[1],3:10] =  dat[(ind-1),3:10]+as.numeric(dat[which(is.na(dat$suicide_score))[1],1]-2015)*slp

  survey_combined[which(survey_combined$state == s),]=dat
}

# chech to see if mean and median are different
med_mn = survey_combined%>%
  group_by(year)%>%
  summarise(med_ss = median(suicide_score, na.rm = T), mn_ss = mean(suicide_score, na.rm = T),
            med_safe = median(school_safety_score, na.rm = T), mn_safe = mean(school_safety_score, na.rm = T),
            med_f = median(fight_score, na.rm = T), mn_f = mean(fight_score, na.rm = T),
            med_b = median(bully_score, na.rm = T), mn_b = mean(bully_score, na.rm = T))

# impute missing values with median value for each column

survey_combined = survey_combined%>%
  ungroup()%>%
  group_by(year)%>%
  mutate(suicide_score = ifelse(is.na(suicide_score), median(suicide_score, na.rm = T),suicide_score))%>%
  mutate(school_safety_score = ifelse(is.na(school_safety_score), median(school_safety_score, na.rm = T),school_safety_score))%>%
  mutate(fight_score = ifelse(is.na(fight_score), median(fight_score, na.rm = T),fight_score))%>%
  mutate(bully_score = ifelse(is.na(bully_score), median(bully_score, na.rm = T),bully_score))%>%
  mutate(n = ifelse(is.na(n), median(n, na.rm = T),n))%>%
  mutate(n1 = ifelse(is.na(n1), median(n1, na.rm = T),n1))%>%
  mutate(n2 = ifelse(is.na(n2), median(n2, na.rm = T),n2))%>%
  mutate(n3 = ifelse(is.na(n3), median(n3, na.rm = T),n3))%>%
  mutate(school_safety_score = ifelse(school_safety_score<0,0,school_safety_score))%>%
  mutate(fight_score = ifelse(fight_score<0,0,fight_score))

Poverty

The financial aspect of each state may have an influence on the number of school shootings that took place each year. Thus we decided to account for the state economic condition, we would include the poverty prevalence rate for each state over the study period. The dataset was obtained from __________________. The dataset contains poverty rate and 90% confidence interval of poverty rate for each state from the year 2010 to 2016.

# read in poverty data

pov=read.csv("census_poverty_data.csv",stringsAsFactors = FALSE)
pov=filter(pov,State!="      United States"&State!="     United States"& State!="District of Columbia"&State!="United States")

names(pov)[1:2]<-c("state","poverty")
names(pov)[4]<-"year"

skel=expand_grid(sort(unique(pov$state)),min(pov$year):(max(pov$year)+2))
names(skel)<-c("state","year")


# Each data point is a three-year average. We assign this average to all years which were combined to make the three year average
sk1=filter(skel,year==2007|year==2008|year==2009)
sk1=left_join(sk1,filter(pov,year==2007),by="state")

sk2=filter(skel,year==2010|year==2011|year==2012)
sk2=left_join(sk2,filter(pov,year==2010),by="state")

sk3=filter(skel,year==2013|year==2014|year==2015)
sk3=left_join(sk3,filter(pov,year==2013),by="state")

sk4=filter(skel,year==2016|year==2017|year==2018)
sk4=left_join(sk4,filter(pov,year==2016),by="state")

pov=rbind(sk1,sk2,sk3,sk4)
pov=pov[,1:3]
names(pov)<-c("state","year","poverty")

Bullying Laws

Bullying in school has been identified one of the significant causes behind school shootings. Some of states addressed that issue and have implemented several judicial laws to prevent school bullying. Though almost all the states have laws to prevent bullying the degree of rigidity varies from state to state. We obtained a dataset that indicates how restrictive the bullying laws are in different states. The dataset was retrieved from stopbullying.gov. The dataset mentions about 14 different components of Bullying laws that each state may or may not have. Thus out of the total score 14, we calculated a total score for each state based on how many components of bullying laws were present in each state. unfortunately the data was available only for the year 2010. Thus we have included the dataset in the analysis assuming that the score remains the same over the time period

bully = read_csv("State_bullying.csv")%>%
  select(c(total_score,state.abb))%>%
  rename(state = state.abb)%>%
  mutate(year = 2010)

Media

As the literature review indicated, we wished to adjust for the impact of media coverage on the school shooting incidences. The dataset was retrieved from a Nature article about media coverage and firearm acquisition after mass shootings (data available on the following GitHub repository). The dataset contains the number of articles that has been published and number of mass shooting in a given month and year. The articles are further clustered on the basis of the topics the articles covered such as: “firearm laws and regulations”, “unemployment”and “shootings excluding firearm laws and regulations”. For each year, the total number of articles containing “shootings excluding firearm laws and regulations” and total number of mass shooting was calculated and number of article per incident was calculated from by dividing total number of articles by the total number of mass shooting. As there can be an school shooting in December whose effect can alter the incident count of the next year, to adjust the lag effect we created a new variable that accounts for the number of previous years articles.

media = read_csv("media_data.csv")%>%
  separate(`Year-Month`, into = c("year","month"), sep = "-")%>%
  clean_names()%>%
  select(-starts_with("x"))%>%
  group_by(year) %>%
  mutate(yearly_articles = sum(articles_shootings_excluding_firearm_laws_and_regulations))%>% 
  mutate(yearly_shootings = sum(mass_shooting))%>%
  mutate(art_per_inc = yearly_articles/yearly_shootings)%>%
  ungroup()%>%
  mutate(prev_month_art = c(0,articles_shootings_excluding_firearm_laws_and_regulations[1:227]))%>%
  mutate(prev_year_art = c(rep(0,12),yearly_articles[1:216]))%>%
  mutate(year = as.numeric(year))

Population

It is well known that states vary widely in terms of population size. Because of this–and because we are looking at event data with number of events as our outcome of interest–we must take population into account in any analysis we do, since states with much higher populations are far more likely to experience events than smaller states. We obtained state population data from the U.S. Census Bureau. The dataset contained populations by over the years 1970 to 2019.

if(import_raw_data){

# 1980-1990
pop8090_raw = read.delim("https://www2.census.gov/programs-surveys/popest/tables/1980-1990/state/asrh/st8090ts.txt" )%>%data.frame()
pop8090_raw = pop8090_raw[c(8:58,64:114),]%>%data.frame()
colnames(pop8090_raw)="var"
pop8090 = data.frame(pop8090_raw$var[1:51],pop8090_raw$var[52:102])

colnames(pop8090)=c("var1","var2")


pop8090 = pop8090%>%
  separate(var1, c("state",as.character(seq(1980,1984,by = 1))), extra = "merge")%>%
  separate(var2, c("state1",as.character(seq(1985,1990,by = 1))), extra = "merge")%>%
  select(c(state,starts_with("19")))
  
# 1970 - 1980
pop7080_raw = read.delim("https://www2.census.gov/programs-surveys/popest/tables/1980-1990/state/asrh/st7080ts.txt")
pop7080_raw = pop7080_raw[c(10:61,63:114),]%>%data.frame()
colnames(pop7080_raw)="var"
pop7080 = data.frame(pop7080_raw$var[1:51],pop7080_raw$var[53:103])

colnames(pop7080)=c("var1","var2")

pop7080 = pop7080%>%
  separate(var1, c("blnk","id","state","1970","1971","1972","1973","1974","1975"), extra = "merge")%>%
  separate(var2, c("blnk1","id1","state1",as.character(seq(1976,1980,by = 1))), extra = "merge")%>%
  select(c(state,starts_with("19")))

# 1990-2000
pop90_00 = read_csv("../../data/pop_90-00.csv")%>%
  rename(state = X1)%>%
  select(c(state,3:13))%>%
  filter(state != "USA")
colnames(pop90_00)[2:ncol(pop90_00)]=as.character(seq(1990,2000,by = 1))
pop90_00$state[30:35] = states$state.name[29:34]
pop90_00$state[40:42] = states$state.name[39:41]
pop90_00$state[9] = states$state.name[51]
pop90_00$state[49] = states$state.name[48]

# 2000-2010
pop00_10 = read.csv("https://www2.census.gov/programs-surveys/popest/datasets/2000-2010/intercensal/state/st-est00int-agesex.csv")%>%
  clean_names()%>%
  filter(sex == 0, age == 999)%>%
  select(c(name,starts_with("popestimate")))%>%
  rename_at(vars(starts_with("popestimate")),funs(substr(.,start = 12, stop = 15)))%>%
  rename(state = name)%>%
  mutate(state = as.character(state))%>%
  mutate(state = ifelse(state=="District of Columbia",as.character(states$state.name[51]),state))%>%
  filter(state != "United States")

# 2010-2018
pop10_18 = read_csv("../../data/pop_2010-2018.csv")
colnames(pop10_18)=pop10_18[1,]

pop10_18 = pop10_18%>%
  slice(-1)%>%
  rename_at(vars(starts_with("Pop")), funs(substr(.,start = 38,stop = 41)))%>%
  rename(state = Geography)%>%
  select(c(state,starts_with("20")))%>%
  mutate(state = ifelse(state=="District of Columbia",as.character(states$state.name[51]),state))

# 2019

pop2019 = read_csv("../../data/2019pop.csv")%>%
  clean_names()%>%
  select(c(state,pop))%>%
  rename("2019"=pop)


# combine all data
pops = left_join(pop7080,pop8090%>%select(-c("1980")), by = "state")%>%
  left_join(states,by = c("state"="state.abb"))%>%
  mutate(state = state.name)%>%
  select(-state.name)%>%
  left_join(pop90_00%>%select(-c("1990")), by = "state")%>%
  left_join(pop00_10%>%select(-c("2000")), by = "state")%>%
  left_join(pop10_18%>%select(-c("2010")), by = "state")%>%
  left_join(pop2019,by = "state")%>%
  gather(key = "year", value = "population", c(2:51))

# write cleaned csv file
write.csv(pops, "state_populations.csv")

}

# read in population data (include abbreviated and full state name)
pops = read_csv("state_populations.csv")%>%
  select(-X1)%>%
  left_join(states, by = c("state"="state.name"))

Aggregating the data

Because of the nature of the school shooting data (each row is one event), we must aggregate the data into a form that we can analyze. We create a new data set with each row representing the data for one state in a given year, with number of school shootings in that state/year as the main outcome of interest. We then merge all other data sets to get the final form of the data we will use for our remaining analysis.

# data set aggrgated by state and year
ss_counts = ss%>%
  group_by(state,year) %>%
  summarize(incident_count = n(), 
            total_victims = sum(total_injured_killed_victims), #include possible secondary outcomes
            total_fatalities = sum(killed_includes_shooter),
            total_wounded = sum(wounded),
            avg_victims = mean(total_injured_killed_victims),
            avg_fatalities = mean(killed_includes_shooter),
            ave_wounded = mean(wounded),
            avg_shooter_age = mean(shooter_age),
            max_shooter_age = max(shooter_age),
            min_shooter_age = min(shooter_age),
            avg_reliability = mean(reliability_score_1_5),
            target = mean(ifelse(targeted_specific_victim_s=="Y",1,
                                 ifelse(targeted_specific_victim_s=="N",0,NA)),na.rm = T),
            random_vict = mean(ifelse(random_victims=="Y",1,
                                      ifelse(random_victims=="N",0,NA)),na.rm = T),
            bullied = mean(ifelse(bullied_y_n_n_a=="Y",1,
                                  ifelse(bullied_y_n_n_a=="N",0,NA)),na.rm = T),
            domestic_violence = mean(ifelse(domestic_violence_y_n=="Y",1,
                                            ifelse(domestic_violence_y_n=="N",0,NA)),na.rm = T))%>%
  mutate(in_ss = TRUE)%>%
  full_join(st_yr, by = c("state","year"))%>%
  left_join(pops%>%rename(full_state_name = state), by = c("state"="state.abb", "year"))%>%
  mutate(incident_count = ifelse(is.na(incident_count),0,incident_count))%>%
  mutate(in_ss = ifelse(is.na(in_ss),FALSE,in_ss))

# merge in grade data
ss_counts = ss_counts%>%
  left_join(grades%>%select(state,grade,year),by = c("full_state_name"="state","year"))%>%
  left_join(gpa_convert, by = c("grade"="letter_grade"))

# merge in media data
ss_counts = ss_counts%>%
  left_join(media%>%select(prev_year_art, year,yearly_articles)%>%distinct(), by = "year")

# merge in poverty data
pov = left_join(pov,states, by = c("state"="state.name"))%>%
  select(-state)%>%
  rename(state = state.abb)
ss_counts = left_join(ss_counts, pov, by = c("state","year"))

# merge bullying data
ss_counts = left_join(ss_counts, bully%>%select(-year), by = "state")%>%
  rename(bullying_score = total_score)%>%
  mutate(bullying_score = as.numeric(bullying_score))

# merge mental health data
ss_counts = ss_counts%>%
  left_join(mh.data, by = c("state","year"))%>%
  rename(mh_score = percent)

# merge CDC survey data
ss_counts = left_join(ss_counts, survey_combined, by = c("state","year"))%>%
  filter(state != "DC")

# merge RAND data
ss_counts = left_join(ss_counts,rand.count.data, by = c("state","year"))

# create new variables of law categories
# -1s are from after realizaing I accidentally marked permissive laws as positive and and restrictive laws as negative... whoops :) 
ss_counts = ss_counts%>%
  mutate(access_laws = -1*(age_permissive-age_restrictive+child_access_permissive-child_access_restrictive+prohibited_possessor_permissive-prohibited_possessor_restrictive))%>%
  mutate(transport= -1*(castle_doctrine_permissive-castle_doctrine_restrictive+open_carry_permissive-open_carry_restrictive+carrying_permissive-carrying_permissive))%>%
  mutate(screening = -1*(background_check_permissive- background_check_restrictive + waiting_period_permissive- waiting_period_restrictive + permit_permissive - permit_restrictive + registration_permissive - registration_restrictive + safety_training_permissive - safety_training_restrictive + dealer_license_permissive - dealer_license_restrictive + one_gun_per_permissive - one_gun_per_restrictive))%>%
  mutate(ban = -ban_permissive+ban_restrictive)%>%
  select(-c(36:63))

EDA

Once all data were obtained we next made exploratory plots to understand the nature of the data and look for trends that will be important in later modeling and statistical analysis. First, we looked at the trend of school shootings over time by plotting the number of incidents per year. We can see that the general trend is that school shooting incidents are increasing over time with a huge spike in the latter half of the past decade. *Note that the last date recorded in our data is November 11th, 2019 and hence we do not have a full year of data for 2019.

ss_counts%>%
  group_by(year)%>%
  summarise(ss = sum(incident_count))%>%
  ggplot(aes(x = year, y = ss))+
  geom_line(color = "red")+
  labs(x = "Year", y = "School Shootings", title = "School Shootings in the U.S.")+
  theme_bw()+
  theme(plot.title = element_text(hjust = 0.5))

ss_counts%>%
  mutate(grade = substr(grade,1,1))%>% # get rid of plus/minus grades (round to letter grade)
  group_by(grade)%>%
  summarise(count = sum(incident_count), # total incidents in states with same grade (all years)
            pop = sum(population))%>% # total population in states with same grade (all years)
  filter(!is.na(grade))%>% # only years with grade value
  ggplot(aes(x = grade, y = (count/pop)*330000000, fill = grade))+
  geom_bar(stat = "identity")+
  scale_fill_viridis_d(direction = -1)+
  theme_classic()+
  labs(y = "Normalized School Shooting Count", x = "Grade", title = "School Shootings by Gun Policy Strength")+
  theme( legend.position = "none", plot.title = element_text(hjust = 0.5))

Gun Policy by State

We next look at how these school shootings occur with respect to gun policy. Below is a plot showing the number of school shootings in each state from the years 2012-2018, where each state is colored by the gun policy grade it received that year (we took grades as letter grades without +/-). As we can see, in the early years it seems that the states with poor gun policy grades account for the most school shootings. Interestingly, in 2018 there seems to be a trend reversal, with states receiving high grades experiencing a high number of school shootings.

# This will handle the ordering
d <- ss_counts %>% 
  filter(year %in% seq(2012,2018,by=1))%>%
  mutate(grade = substr(grade,1,1))%>%
  ungroup() %>%   # As a precaution / handle in a separate .grouped_df method
  arrange(year, grade) %>%   # arrange by facet variables and continuous values
  mutate(.r = row_number()) # Add a row number variable

ggplot(d, aes(x = .r, y= incident_count, fill = grade)) +  # Use .r instead of x
  geom_point(data = d%>%filter(incident_count==0),
             aes(x = .r, y= incident_count, color = grade),
             size = 0.7) +
  geom_bar(stat = "identity")+
  facet_wrap(~ year, scales = "free_x") +  # Should free scales (though could be left to user)
  scale_x_continuous(  # This handles replacement of .r for x
    breaks = d$.r,     # notice need to reuse data frame
    labels = d$state
  )+
  labs(title = "Incident Count by Grade", x = "State", y = "School Shootings")+
  scale_fill_viridis_d( name = "Grade", direction = -1)+
  scale_color_viridis_d( guide = FALSE, direction = -1)+
  theme_bw()+
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 3.5), legend.position = c(0.7,.15), legend.direction = "horizontal",legend.key.size = unit(1.5,"line"), legend.text = element_text(size = 8), legend.title = element_text(size = 10),plot.title = element_text(hjust = 0.5))

We next look at how state gun policy grades and number of gun laws in effect (by category) change over time. We plot the grades on a 4.0 GPA scale. Gun laws in effect is calculated for each state and year as \(# Restrictive laws - # Permissive laws\). In these animations we can see that while some states’ grades and laws experience large changes over time, many do not (this can be seen particularly well in ban and transport laws that seem relatively stable over the time period 2007-2019).

# variable to create animations (takes a few seconds so I saved files in the directory rather than building them each time I knit the code)
build_animations = FALSE

if(build_animations){
# convert letter grade to gpa
grades = left_join(grades, gpa_convert, by = c("grade"="letter_grade"))

# get long/lat data
dat_loc= map_data("state")%>%
  mutate(sts=toupper(region))

# merge data
b=full_join(grades%>%mutate(state = tolower(state)), dat_loc, by=c("state"="region"))
b = b%>%mutate(year = as.numeric(year))%>%filter(!is.na(year))%>%select(-subregion)
b = b[complete.cases(b),]

# create plot to animate
map_plts = ggplot(data = b, aes(x = long, y = lat, group = group))+
  geom_polygon(aes(fill = gpa), color = "white")+
  scale_fill_viridis(option = "D", name = "GPA")+
  labs(title = "Grade Changes by Year", subtitle = 'Year: {frame_time}', x = NULL, y = NULL)+
  theme_bw()+
  theme(line = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), plot.title = element_text(hjust = 0.5))+
  transition_time(year, range = c(2012,2018))

# run animation
animate(map_plts, nframes = 7, fps = 1)

# save as file to save time when running next time
anim_save("state_grade_animation.gif")
}
if(build_animations){
# long/lat data
dat_loc = left_join(dat_loc,states%>%mutate(state.name = tolower(state.name)), by = c("region"="state.name"))
b2=full_join(ss_counts%>%ungroup()%>%filter(year>=2007)%>%filter(!(state%in%c("AK","HI")))%>%select(state,year,access_laws,transport,screening,ban), dat_loc, by=c("state"="state.abb"))%>%
  select(-subregion)%>%
  gather(key = "law_class", value = "number_in_effect", c(3:6))
b2 = b2[complete.cases(b2),]

# facet labels for cleaner plots
law.labs <- c("Access Laws", "Bans","Screening Laws", "Transport Laws")
names(law.labs) <- c("access_laws", "ban","screening","transport")

# create plots for animation
map_plts_law = ggplot(data = b2, aes(x = long, y = lat, group = group))+
  geom_polygon(aes(fill = number_in_effect), color = "white")+
  scale_fill_viridis(option = "D", name = "Laws in\nEffect\n(R-P)")+
  labs(title = "Law Changes by Year",subtitle = 'Year: {frame_time}', x = NULL, y = NULL)+
  theme_bw()+
  facet_wrap(.~law_class, labeller = labeller(law_class = law.labs))+
  theme(line = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), plot.title = element_text(hjust = 0.5))+
  transition_time(year, range = c(2007,2019))

# run animation 
animate(map_plts_law, nframes = 13, fps = 1)

# save file
anim_save("state_law_animation.GIF")
}

Media Coverage

Another variable we wished to control for was media coverage of shooting events in the United States. Below we see that as the years have progressed, there has been a general increase in the number of articles published about shootings. Here we are using the number of articles in the New York Times and Washington Post as a proxy for general media coverage trends. We hypothesize that the increase in media coverage of mass shootings may help explain the increase in school shootings, as increased public awareness of these shootings may unintentionally normalize these events or even inspire further violence. Hence we also plot the number of school shootings colored by the number of articles about shootings published in the previous year. If the aforementioned hypothesis is true, we would expect to see higher bars (indicating more school shootings) in years in which the previous year had a lot of articles published about shootings. While there does seem to be some trend in the data, it is not particularly clear if this is the case or not.

# plot number of artciles per month over time 
media%>%
    mutate(date = as.Date(zoo::as.yearmon(paste(year,"-",month, sep = ""))))%>%
  ggplot(aes(y = articles_shootings_excluding_firearm_laws_and_regulations, x= date, color = year))+
  geom_point()+
  geom_jitter()+
  labs(y = "Monthly Articles",x =  "Year", title = "Media Coverage of Shootings", subtitle = "Articles in NYT and Washington Post Regarding Shootings")+
  scale_color_viridis_c(name = "Year")+
  theme_bw()+
  theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5, size = 8))

# plot school shooting counts per year colored by number of articles in previous year
ss_counts%>%
  group_by(year)%>%
  summarise(yearly_incidents = sum(incident_count), prev_year_art = prev_year_art[1] )%>%
  filter(year %in% seq(1999,2017, by = 1))%>%
  ggplot(aes(x = year, y = yearly_incidents, fill = prev_year_art))+geom_bar(stat = "identity")+
  labs(y = "Yearly School Shootings", x = "Year", title = "Media Coverage Lag Effect")+
  scale_fill_viridis(option = "D", name = "Articles in previous year")+
  theme_bw()+
  theme(plot.title = element_text(hjust = 0.5))

YRBSS Survey

Next we look at student responses from the CDC YRBSS surveys. We plot the data (not imputed) using the aggregated variables we defined previously (4 categories: bullying, physical altercations, school safety, suicide) to see trends across time and states. We can see that these scores seem to stay relatively consistent across time with only some minor changes; validation our choice to impute values with the mean of adjacent years and using linear trajectory after 2015. There seems to be more variability across states (see boxplots). It is important to note that this suggests that our choice to impute data for unobserved states with the median of all other states’ data (by year) will underestimate variance. This will be important to recall when interpreting the model results for these variables.

# retrieve origiial data without imputed data
survey_combined_raw = bind_cols(suicide,safety, fight, bull)%>%
  select(year, state, score, n, score1, n1, score2, n2, score3, n3)%>%
  rename(suicide_score = score, school_safety_score = score1, fight_score = score2, bully_score = score3)%>%
  filter(year>=2007)%>%
  full_join(st_yr%>%filter(year>=2007))%>%
  arrange(year)%>%
  arrange(state)%>%
  mutate(state = ifelse(state == "AZB","AZ",state))

# cleaner facet labels
surv.labs = c("Bullying Score", "Physical Altercation Score", "School Danger Perception Score", "Suicidal Thoughts or Actions Score")
names(surv.labs)= c("bully_score", "fight_score", "school_safety_score","suicide_score")

# plot survey responses over time 
survey_combined_raw%>%
  gather(key = "score_type", value = "percent_yes", c(3,5,7,9))%>%
  ggplot(aes(x = year, y = percent_yes, color = score_type))+
  geom_point(position = "jitter")+
  geom_smooth(se=F)+
  scale_color_viridis_d(name = "Score Type", labels = c("Bullying\n","Involvment in \nPhysical Altercation\n","School Danger\nPerception\n","Suicidal \nThoughts/Actions\n"))+
  xlim(2007,2015)+
  labs(y = "% Responding Yes", x = "Year", title = "Survey Responses", subtitle = "(Aggregated Variables)")+
  theme_bw()+
  theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5, size = 8))

# plot survey responses variation by state
survey_combined_raw%>%
  gather(key = score_type, percent, c(3,5,7,9))%>%
  ggplot(aes(x = state, y = percent, color = score_type ))+
  geom_boxplot()+
  scale_color_viridis_d()+
  labs(title = "Survey Reponses by State", x = "State", y = "Percent Answering 'Yes'")+
  facet_wrap(.~score_type,labeller = labeller(score_type = surv.labs))+
  theme_bw()+
  theme(legend.position = "none",axis.text.x = element_text(angle = 45, hjust = 1, size = 4.5),plot.title = element_text(hjust = 0.5))

Mental Health Data

When we look further into the mental health data, we realize there are many issues with these data. For one, the survey years stated on the Mental Health America website for each year’s report sometimes overlap, but do not share the same values. Further, the reported percentages do not always change from year to year. The below plot shows an example of this, using the year prior to the report as the survey year.

mh.data=mh.data%>%clean_names()
ggplot(data=mh.data,mapping=aes(x=year,y=percent,col=state))+geom_line()+labs(title="Percent of Youth Who Experienced an MDE and Did Not Receive Treatment",x="Year",y="Percent")

Note that 2015 and 2016 show no change in values. We also notice that there does not appear to be any pattern in the percent of youth experiencing a major depressive episode and not receiving treatment by state or time. This observation, along with the lack of clarity on the appropiate survey years for the data and that two years’ reports had the same values led us to conclude that this data should not be included in our models.

Model Fitting

We next turn to fitting a model to explain the incidence of school shootings in the United States. We first choose to fit a Poisson generalized linear model (GLM) with school shooting count (by state and year) as the outcome of interest. Poisson GLMs are commonly-used for count data, which is why we started with these models. Our choice was then validated by a number of model diagnostic/goodness-of-fit tests (presented below).

Model Using State Grades

We include the following variables in the model (each varies by time and/or state):

  • State gun policy grade (converted to 4.0 GPA scale)
  • Percent of population living below the poverty line
  • Strength of anti-bullying laws
  • Media coverage in the previous year (number of NYT and WP articles)
  • Time (year)
  • Survey suicide score
  • Survey physical altercations score
  • Survey school safety score (student perception of danger)
  • Survey bullying score (proportion of students experiencing bullying)
  • Population (log scale)

Note: population was not used as a variable but as an offset term in the model (coefficient not estimated but set to be 1). Note we also have one year’s worth of data for the “strength of anti-bullying laws” variable so we use it just as a fixed state effect (doesn’t vary over time). Media data only varies by year, not by state.

More formally, the model we fit is:

\[\log(E[I_{t,s}])=\beta_0+\beta_1\cdot g_{t,s}+\beta_2\cdot p_{t,s}+\beta_3\cdot l_{t,s}+\beta_4\cdot b_{s}+\beta_5\cdot s_{t,s}+\\\beta_6\cdot y_{t,s}+\beta_7\cdot f_{t,s}+ \beta_8\cdot t+ \beta_9 \cdot m_{t-1} +\log( n_{t,s}) \]

where here for each state \(s\) and year \(t\) we have that \(I_{t,s}\) is the school shooting incident count, \(g_{t,s}\) is the grade, \(p_{t,s}\) is the percent of the population living in poverty, \(b_{t,s}\) is the survey-derived bullying score, \(l_{s}\) is the state anti-bullying law score (does not vary by year), \(s_{t,s}\) is the survey-derived suicide score, \(t\) is time (year), \(f_{t,s}\) is the survey-derived physical altercation score, \(y_{t,s}\) is the survey derived school danger score, \(m_t\) is the media score (number of articles) the previous year, and \(n_{t,s}\) is the population.

Model Results

Below we show the model output. One can see that the only significant covariate in this model is GPA (state gun policy grade). The estimate is negative, as we would hope, as higher GPA implied stricter gun laws and, hopefully, fewer school shootings. And in fact, this is what the model is showing.

# fit model written above
fit_grade = glm(incident_count~gpa+poverty+bullying_score+ prev_year_art+year+suicide_score+fight_score+school_safety_score+bully_score+ offset(log(population)), family = "poisson", data = ss_counts)

# table with model output
kbl = broom::tidy(fit_grade)
kbl$term = c("Intercept","GPA", "Poverty", "Bullying Laws score", "Articles in previous year", "Year","Suicide score", "Fight score" , "School safety score", "Bullying score")
kableExtra::kable( kbl, align = "c" , digits = 4, caption = "GLM Output Using State Policy Grades as Covariates")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
GLM Output Using State Policy Grades as Covariates
term estimate std.error statistic p.value
Intercept -73.6027 117.6988 -0.6253 0.5317
GPA -0.3189 0.0707 -4.5135 0.0000
Poverty -0.0396 0.0305 -1.2967 0.1947
Bullying Laws score 0.0841 0.0513 1.6380 0.1014
Articles in previous year 0.0005 0.0003 1.5367 0.1244
Year 0.0283 0.0584 0.4846 0.6280
Suicide score 3.4774 4.8977 0.7100 0.4777
Fight score -3.5257 2.6116 -1.3500 0.1770
School safety score -3.7020 3.9987 -0.9258 0.3546
Bullying score 1.7618 3.4415 0.5119 0.6087
# 1- to show estimated drop
exp_change = 1-round(exp(-.3189), digits = 3)

# C.I.
l = 1-round(exp(-.3189 + 1.96 *.0706502 ), digits = 3)
u = 1-round(exp(-.3189 - 1.96 *.0706502 ), digits = 3)


# change from F to A grade (4 point increase in gpa) (1- to show estimated drop)
f_to_a = 1-round(exp(4*-.3189), digits = 4)

# C.I.
l4 = 1-round(exp(4*-.3189 + 1.96*4*.0706502 ), digits = 3)
u4 = 1-round(exp(4*-.3189 - 1.96*4*.0706502 ), digits = 3)

From this model we predict that with a one unit change in GPA, a state would expect to see a 27.3% reduction (95% C.I. (16.5%,36.7%)) in school shootings. This model also implies that increasing GPA by 4 (going from an F to an A) the expected drop in school shootings is 72.1% (95% C.I. (51.4%,84%)). For context, the average number of school shootings per year and state in the data is 0.516.

Model Checking

To check the model we run a number of diagnostic tests. First we do a deviance test and get a p-value of \(p=0.838\) against our null hypothesis that the model fits well (no evidence against model). We also test for any overdispersion and get a p-value of \(p = 0.945\) against the null that there is no overdispersion (no evidence against the null). Lastly we plot diagnostic plots and see that the model fits relatively well.

#check model fit using deviance
p_fit_grade = 1-pchisq( fit_grade$deviance, df = fit_grade$df.residual)

# check for overdispersion
disp_test = AER::dispersiontest(fit_grade,trafo=1)
p_disp = disp_test$p.value
disp_est = disp_test$estimate

# diagnostic plot
plot(DHARMa::simulateResiduals(fit_grade))

GEE

Finally, we wished to see if there was variability between states that we did not already account with significant effects on school shootings. We did this using Generalized Estimating Equations (GEE) taking state as the grouping factor. We used an autoregressive correlation structure within states to take into account fading lag effects across years. We can see below that even when taking state variation into account, GPA still has a significant effect in reducing school shootings. The results are very similar to our previous GLM results.

# fit gee
gee_dat_grade = ss_counts%>%
  select(c(1,2,3,20,22,23,25,26,28,30,32,34))

gee_grade = geeglm(incident_count~gpa+poverty+bullying_score+ prev_year_art+year+suicide_score+fight_score+school_safety_score+bully_score,family=poisson("log") ,offset = log(population),id=as.factor(state), corstr = "ar1", data=gee_dat_grade[complete.cases(gee_dat_grade),]) 
# n = 300 observations

# table with model output
kbl_gee = broom::tidy(gee_grade)
kbl_gee$term = c("Intercept","GPA", "Poverty", "Bullying Laws score", "Articles in previous year", "Year","Suicide score", "Fight score" , "School safety score", "Bullying score")
kableExtra::kable( kbl_gee, align = "c" , digits = 4, caption = "GEE Output Using State Policy Grade as Covariates")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),full_width = F)
GEE Output Using State Policy Grade as Covariates
term estimate std.error statistic p.value
Intercept -132.2346 118.1578 1.2525 0.2631
GPA -0.2778 0.1006 7.6285 0.0057
Poverty -0.0270 0.0475 0.3247 0.5688
Bullying Laws score 0.0722 0.0771 0.8787 0.3486
Articles in previous year 0.0003 0.0003 0.6110 0.4344
Year 0.0572 0.0585 0.9566 0.3281
Suicide score 1.9226 4.8621 0.1564 0.6925
Fight score -1.4986 3.5232 0.1809 0.6706
School safety score -2.5324 3.3665 0.5659 0.4519
Bullying score 4.4356 4.2274 1.1009 0.2941

Model Using State Laws

Because we found a statistically significant effect in the model using state gun policy grade, we wish to look further into this trend by looking specifically at which types of laws might be driving the differences in school shooting rates. Thus, we further remove the “grade” variable in the above model and instead use the combined law count variables described in an earlier section (4 categories : gun access laws, bans, screening laws, and transport laws). Recall that these variables are the difference in the sum of restrictive laws in effect minus the sum of permissive laws in effect for a given state and year (policies beginning or ending in the middle of the year are rounded to the nearest year). Thus, the model is the same except we now split the variable accounting for state gun policy into 4 variables (one for each of the law type categories) and estimate a separate coefficient for each. That is,

\[\log(E[I_{t,s}])=\beta_0+\beta_1 \cdot p_{t,s}+\beta_2\cdot l_{t,s}+\beta_3\cdot b_{s}+\beta_4\cdot s_{t,s}+\beta_5\cdot y_{t,s}+\beta_6\cdot f_{t,s}+ \beta_7\cdot t+ \beta_8 \cdot m_{t-1} +\\ \beta_9 \cdot al_{t,s}+\beta_{10} \cdot bl_{t,s}+\beta_{11} \cdot sl_{t,s} +\beta_{12} \cdot sl_{t,s}+ \log( n_{t,s}) \]

where here all variables are the same as above with the addition of \(al_{t,s}\), \(bl_{t,s}\), \(sl_{t,s}\), \(tl_{t,s}\), which represent the access laws, bans, screening laws, and transport laws, respectively, for state \(s\) in year \(t\).

Model Results

Below we see that from this model, year, media, suicide score, fight score, access laws, and bans all have significant effect at an \(\alpha\) level of 0.05. There is also some evidence that transport laws have an effect on school shooting incidence; though, interestingly this slope is positive indicating more transport laws are associated with higher rates of school shootings.

# modeel fit with law counts
fit_law = glm(reformulate(names(ss_counts)[c(2,23,25,26,28,30,32,34,36:39)],names(ss_counts)[3]), offset=log(population), family = "poisson", data = ss_counts)

# table with output
kbl2 = broom::tidy(fit_law)
kbl2$term = c("Intercept","Year","Articles in previous year","Poverty", "Bullying Laws score",  "Suicide score",  "School safety score", "Fight score" ,"Bullying score", "Access Laws", "Transport Laws", "Screening Laws", "Bans")
kableExtra::kable( kbl2, align = "c" , digits = 4, caption = "GLM Output using State Laws as Covariates")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
GLM Output using State Laws as Covariates
term estimate std.error statistic p.value
Intercept 110.0419 60.4020 1.8218 0.0685
Year -0.0626 0.0301 -2.0814 0.0374
Articles in previous year 0.0009 0.0003 3.0520 0.0023
Poverty -0.0344 0.0226 -1.5207 0.1283
Bullying Laws score 0.0616 0.0422 1.4616 0.1439
Suicide score 10.6968 4.4033 2.4292 0.0151
School safety score -5.6336 3.8883 -1.4489 0.1474
Fight score -6.6711 2.1468 -3.1075 0.0019
Bullying score -3.3383 2.8957 -1.1528 0.2490
Access Laws -0.0666 0.0274 -2.4311 0.0151
Transport Laws 0.0857 0.0520 1.6475 0.0995
Screening Laws 0.0024 0.0292 0.0832 0.9337
Bans -0.2604 0.1226 -2.1230 0.0338

Our model estimates that every addition of a restrictive (or removal of permissive) access law is associated with a 6.4% (95% C.I. (1.3%,11.3%)) drop in school shootings and, likewise, every addition of a ban is associated with a 22.9% decrease in school shootings (95% C.I. (2%,39.4%)).

# predict on raw scale for variables of interest

# bans estimate and conf int
bans_est = paste( round(1-exp(-0.2604), digits = 4)*-100, "%", sep = "")
bans_ci_l = paste( round(1-exp(-0.2604- 1.96*0.1226),digits = 4)*-100, "%", sep = "")
bans_ci_u = paste( round(1-exp(-0.2604+ 1.96*0.1226),digits = 4)*-100, "%", sep = "")

# access laws estimate and conf int
access_est = paste( round(1-exp(-0.0666), digits = 4)*-100, "%", sep = "")
access_ci_l = paste( round(1-exp(-0.0666    -1.96*0.0274), digits = 4)*-100, "%", sep = "")
access_ci_u = paste( round(1-exp(-0.0666    +1.96*0.0274), digits = 4)*-100, "%", sep = "")

# table of estimates on raw scale
kableExtra::kable(data.frame(Variable = c("Access Laws","Bans"), Estimate = c(access_est,bans_est), `C.I Lower`= c(access_ci_l,bans_ci_l),`C.I Upper`= c(access_ci_u,bans_ci_u)  ), digits = 3, caption = "Estimated Change in School Shooting Victims per Unit Increase in Law Type", align = "c")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)
Estimated Change in School Shooting Victims per Unit Increase in Law Type
Variable Estimate C.I.Lower C.I.Upper
Access Laws -6.44% -11.33% -1.28%
Bans -22.93% -39.39% -1.99%

Model Checking

Again, we check our model fit using a variety of methods. For our deviance test we get no significant evidence against our model (\(p = 0.9593\)). We also do not get evidence of overdispesion (\(p = 0.3599\)) with an estimated dispersion parameter of 0.0181. The diagnostic plots show an even better fit than our first model (red lines look more horizontal than in previous model).

#check model fit using deviance
p_fit_law = 1-pchisq( fit_law$deviance, df = fit_law$df.residual)

# check for overdispersion
disp_test_law = AER::dispersiontest(fit_law,trafo=1)
p_disp_law = disp_test_law$p.value
disp_est_law = disp_test_law$estimate

# diagnostic plot
plot(DHARMa::simulateResiduals(fit_law))

# compare to zero-inflated model
z_fit = pscl::zeroinfl(reformulate(names(ss_counts)[c(2,23,25,26,28,30,32,34,36:39)],names(ss_counts)[3]), offset=log(population), dist  = "poisson", data = ss_counts)
## simlilar results

GEE

We, again, use GEE to check for unaccounted-for variability between states. Interestingly, for this set of covariates, the model output does not match our previous output with respect to significance (direction of estimated coefficients still similar). Here none of the types of law show a significant effect on school shootings with p-values all greater than 0.4. We believe one explanation of this is that because these laws do not change drastically from year to year, they may be absorbed into the state effect and hence appear insignificant after allowing for state variation. This does not necessarily mean that these laws do not play a role in school shootings. Rather, this could just be a consequence of the laws’ lack of variation across time, making their effects difficult to distinguish from other within-state variability in the data we have.

# dropped media data because otherwise there wasn't enough data
gee_dat_law = ss_counts%>%
  select(c(1,2,3,20,25,26,28,30,32,34,36:39))

gee_law = geeglm(incident_count ~ year  + poverty + suicide_score + school_safety_score + fight_score + bully_score + access_laws + transport + screening + ban,family=poisson("log"), offset = log(population),id=as.factor(state), corstr = "ar1",
                data=gee_dat_law[complete.cases(gee_dat_law),])
# n = 550 observations

# table with output
kbl2_gee = broom::tidy(gee_law)
kbl2_gee$term = c("Intercept","Year","Poverty", "Suicide score",  "School safety score", "Fight score" ,"Bullying score", "Access Laws", "Transport Laws", "Screening Laws", "Bans")
kableExtra::kable( kbl2_gee, align = "c" , digits = 4,caption = "GEE Output using State Laws as Covariates")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
GEE Output using State Laws as Covariates
term estimate std.error statistic p.value
Intercept -109.1746 44.7266 5.9582 0.0146
Year 0.0468 0.0222 4.4672 0.0346
Poverty -0.0623 0.0347 3.2199 0.0727
Suicide score 4.8523 3.4369 1.9932 0.1580
School safety score -0.2579 2.6207 0.0097 0.9216
Fight score -4.9753 2.6294 3.5804 0.0585
Bullying score 1.3050 3.3212 0.1544 0.6944
Access Laws -0.0345 0.0492 0.4929 0.4826
Transport Laws 0.0641 0.0772 0.6885 0.4067
Screening Laws -0.0249 0.0453 0.3028 0.5821
Bans -0.0647 0.2027 0.1020 0.7494

Secondary Analysis

The analysis above uses number of school shootings as its outcome, but does not account for the severity of the events in terms of number of victims/fatalities. Instead, it treats all shooting events equally. Next, we perform a secondary analysis using the number of victims from all school shootings in a given state and year as the outcome of interest (total wounded or killed including shooter), rather than just incident count. For any state and year combination with no school shootings recorded, we assign the value 0 for number of victims. Hence, this analysis will treat shootings with no victims (no persons wounded or killed) the same as days in which no shootings occur. We do this so that the analysis can be interpreted as number of victims of school shootings (injuries/fatalities) that could be avoided with different gun policies.

Model Results

We fit negative binomial models using the same predictors as the earlier analysis, first using state gun policy grades (on 4.0 GPA scale) and next using the variables corresponding to the four types of laws with number of victims per year in each state as the outcome. The rest of the covariates are the same as above. We choose to fit negative binomial models in this case because Poisson models fit poorly (p value of 0 and estimated dispersion parameter of 2.39).

Model using grades:

The first model we fit uses state gun policy grades on a 4.0 GPA scale as well as the other variables mentioned above as predictors to model number of victims from school shootings. We see from the model that for every one unit change in GPA we estimate a 19.1% drop (95% C.I. (-34.88%,0.54%) percent change in victim count) in number of victims (per state in a given year). We check our model fit with a deviance test and do not get evidence against our model (p = 0.993).

# why we didnt fit poisson models
pois_fit = glm(total_victims~gpa+poverty+bullying_score+ prev_year_art+year+suicide_score+fight_score+school_safety_score+bully_score+ offset(log(population)),  data = ss_counts%>% mutate(total_victims = ifelse(is.na(total_victims),0,total_victims)), family = "poisson")

#estimate of dispersion
disp = pois_fit$deviance/pois_fit$df.residual
#2.39

# fit model written above
fit_grade_victims = MASS::glm.nb(total_victims~gpa+poverty+bullying_score+ prev_year_art+year+suicide_score+fight_score+school_safety_score+bully_score+ offset(log(population)),  data = ss_counts%>%
  mutate(total_victims = ifelse(is.na(total_victims),0,total_victims)))

# table with model output
kbl_vict = broom::tidy(fit_grade_victims)
kbl_vict$term = c("Intercept","GPA", "Poverty", "Bullying Laws score", "Articles in previous year", "Year","Suicide score", "Fight score" , "School safety score", "Bullying score")
kableExtra::kable( kbl_vict, align = "c" , digits = 4, caption = "GLM Output Using State Policy Grades as Covariates and Number of Victims as the Outcome")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
GLM Output Using State Policy Grades as Covariates and Number of Victims as the Outcome
term estimate std.error statistic p.value
Intercept 24.4992 178.8202 0.1370 0.8910
GPA -0.2118 0.1108 -1.9113 0.0560
Poverty -0.0625 0.0419 -1.4910 0.1360
Bullying Laws score 0.1116 0.0644 1.7330 0.0831
Articles in previous year 0.0003 0.0005 0.5941 0.5524
Year -0.0207 0.0889 -0.2325 0.8161
Suicide score 11.2271 6.7357 1.6668 0.0956
Fight score 5.8992 3.2758 1.8008 0.0717
School safety score -16.5938 6.7962 -2.4416 0.0146
Bullying score 0.9127 5.5661 0.1640 0.8698
# check model fit
check_fit = 1-pchisq(fit_grade_victims$deviance, df = fit_grade_victims$df.residual)
# estimate 
# access laws estimate and conf int
gpa_est = paste( round(1-exp(-0.2118), digits = 4)*-100, "%", sep = "")
gpa_ci_l = paste( round(1-exp(-0.2118 - 1.96*   0.1108), digits = 4)*-100, "%", sep = "")
gpa_ci_u = paste( round(1-exp(-0.2118 + 1.96*   0.1108), digits = 4)*-100, "%", sep = "")

Model using laws:

We next fit a similar model, this time replacing GPA with counts for each of the four categories of laws in effect in each state per year (restrictive laws - permissive laws). We choose to fit a negative binomial model once again for the same reasons as above.

# fit model
fit_law_victims = MASS::glm.nb(total_victims ~ year+prev_year_art + poverty + bullying_score + suicide_score + school_safety_score + fight_score + bully_score + access_laws + 
    transport + screening + ban+ offset(log(population)), data = ss_counts%>%
  mutate(total_victims = ifelse(is.na(total_victims),0,total_victims))) # need to add in zero for rows with no shootings

# table with model output
kbl3 = broom::tidy(fit_law_victims)
kbl3$term = c("Intercept","Year","Articles in previous year","Poverty", "Bullying Laws score",  "Suicide score",  "School safety score", "Fight score" ,"Bullying score", "Access Laws", "Transport Laws", "Screening Laws", "Bans")
kableExtra::kable( kbl3, align = "c" , digits = 4, caption = "GLM Output Using State Laws as Covariates and Number of Victims as the Outcome")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
GLM Output Using State Laws as Covariates and Number of Victims as the Outcome
term estimate std.error statistic p.value
Intercept -22.8540 91.2353 -0.2505 0.8022
Year 0.0031 0.0454 0.0687 0.9452
Articles in previous year 0.0005 0.0004 1.2542 0.2098
Poverty -0.0654 0.0324 -2.0201 0.0434
Bullying Laws score 0.0894 0.0502 1.7805 0.0750
Suicide score 16.1287 6.2710 2.5720 0.0101
School safety score -16.1897 6.0430 -2.6791 0.0074
Fight score 2.7016 2.6361 1.0249 0.3054
Bullying score -1.7542 4.3870 -0.3999 0.6893
Access Laws -0.1089 0.0378 -2.8825 0.0039
Transport Laws 0.2066 0.0872 2.3691 0.0178
Screening Laws 0.0722 0.0369 1.9573 0.0503
Bans -0.4477 0.1708 -2.6215 0.0088

Here we see that there is evidence of effect for all four types of gun laws. Below we show the estimated change in number of victims from school shootings per unit change in each type of gun law (negative values correspond to decreases in victim counts). For context, the average response value in the data set is 0.8564 victims per state and year. Interestingly, in this analysis screening and transport laws both have positive coefficients, indicating that stricter laws are associated with more fatalities/injuries from school shootings. While the effect of screening laws is relatively small, the transport law effect is sizable. Further analysis may want to look at these laws in further detail to understand this trend.

# predict on raw scale for variables of interest
# include estimated change and conf int

# bans
bans_est = paste( round(1-exp(-0.4477), digits = 4)*-100, "%", sep = "")
bans_ci_l = paste( round(1-exp(-0.4477  -1.96*0.1708), digits = 4)*-100, "%", sep = "")
bans_ci_u = paste( round(1-exp(-0.4477  +1.96*0.1708), digits = 4)*-100, "%", sep = "")

# access laws
access_est = paste( round(1-exp(-0.1089), digits = 4)*-100, "%", sep = "")
access_ci_l = paste( round(1-exp(-0.1089-1.96*  0.0378), digits = 4)*-100, "%", sep = "")
access_ci_u = paste( round(1-exp(-0.1089+1.96*  0.0378), digits = 4)*-100, "%", sep = "")

#transport laws
trans_est = paste("+", round( 1-exp(0.2066), digits = 4)*-100, "%", sep = "")
trans_ci_l = paste( round( 1-exp(0.2066-1.96*   0.0872), digits = 4)*-100, "%", sep = "")
trans_ci_u = paste( round( 1-exp(0.2066+1.96*   0.0872), digits = 4)*-100, "%", sep = "")

#screening laws
screen_est = paste("+", round(1-exp(0.0722), digits = 4)*-100, "%", sep = "")
screen_ci_l = paste( round(1-exp(0.0722-1.96*   0.0369), digits = 4)*-100, "%", sep = "")
screen_ci_u = paste( round(1-exp(0.0722+1.96*   0.0369), digits = 4)*-100, "%", sep = "")

# print table
kableExtra::kable(data.frame(Variable = c("Access Laws","Bans","Transport Laws", "Screening Laws"), Estimate = c(access_est,bans_est, trans_est, screen_est), `C.I Lower`=c(access_ci_l,bans_ci_l, trans_ci_l,screen_ci_l) ,`C.I Upper`=c(access_ci_u,bans_ci_u, trans_ci_u, screen_ci_u)), caption = "Estimated Change in School Shooting Victims per Unit Increase in Law Type", align = "c")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)
Estimated Change in School Shooting Victims per Unit Increase in Law Type
Variable Estimate C.I.Lower C.I.Upper
Access Laws -10.32% -16.72% -3.42%
Bans -36.09% -54.27% -10.68%
Transport Laws +22.95% 3.63% 45.87%
Screening Laws +7.49% -0.01% 15.55%

Conclusions

Recommendation

The analysis showed explicit evidences that the states with more prohibitive gun laws policies had lesser number of school shooting experiences and thus leads us to suggest improving gun policies and imposing more restrictive gun laws may help to reduce the number of school shootings. The analysis of individual types of gun laws reveals that controlling the accessibility of guns and banning firearms may improve the prevalent school shooting rates. Based on the other findings of our primary analysis we would also suggest to pay more attention to the media coverage of the school shooting incidences. We would recommend media to not broadcast the name and details of the shooters rather focus more on the victims as often cases potential attackers may be motivated by the publicity. Youth mental health should given more priority and states can offer different behavioral interventions in schools to promote mental health.

Limitations

The gun law scores were assigned to each state based on the how restrictive and effective gun policies and laws the state implements, however there was not been much details provided in the website about the mechanism that was applied in the grading scores. The website does not clearly state which are the laws they have considered while grading the states and also how much weight each gun laws were assigned while calculating the score. Again the relative difference between two grades such as B to A and F to E, was been considered as equal change as in both cases the more gun policies or laws has been introduced and grades have improved by one scale but it ignores the fact that the state moving from F to E might had to struggle more to improve the situation than the state which already had a better judicial law about gun policies.

Similarly when we were investigating the effects of different gun laws separately, we assumed that the all the gun laws hold similar importance and we had assigned equal weights to all the laws. For all the different types of all, for each cases when there was a permissive law change we assigned -1 and for each restive law change we assigned +1. Again, as mentioned at the data section some of the laws were changed in the middle of the year and we decided to round the value to the nearest year which may cause some loss of information. In the dataset we have considered the law changes that took place after the year 2010. However, there can be some changes in the gun laws prior to 2010 which may still have some impact on the preceding years states gun policies and may have a confounding effect on our analysis and it is not possible to adjust for it. We are also aware of the fact that even if the laws were changed at a certain time point, it takes time for the changes to make a significant impact on the overall gun prevalence and gun violence thus it is difficult to quantify the effect of gun policies on school shootings.

It is ascertained that the state with higher population would have more number of schools and also might have more school shooting incidences and we wished to adjust for the population. Though we could have adjust the model only for the school going population but the dataset shows that there was a large number of shooters with age more than 18 years. Hence we decided to take into account the overall population of each state for each year instead of just the school going population.

As the literature view suggested we wished to adjust for the prevalence of mental heath illness in each state over the study period and collected data on the prevalent rate of youths who were suffering from major depression but did not receive any mental health treatment. However the exploratory data analysis revealed that there were extreme inconsistency in the data. The documentation clearly did not specify the study years and though the report 2018 and 2019 mentioned that they have used survey data from the year 2014-2015, the values were different. Because of the reports did not explicitly defined the survey years we decided not to include the data in our analysis. However, if the dataset was persistent, it would have added more information to our analysis.

Future Work

We selected the prevalent poverty rate as an indicator of the financial stability and economic condition of each state. As the literature review implies, we could have collected unemployment rate for each state and for each year but since the majority of the school shooters were students, it is coherent to assume that whether the students are coming from financially stable families or not would impact more on school shootings than employment rate. However, for future analysis we could account for unemployment rate, income per household and other economic variables to make the study more exhaustive.

Bullying at school is often considered as one of the crucial components of school shooting incidences and thus we wished to account for the variation in the existing bullying laws in different states. However the most credible dataset that was available for each state contained data only for the year 2010. So we included the variable to adjust for the variation in state laws assuming that the states maintained the same level of rigidity towards the bullying laws throughout the study period. The analysis would have been more thorough if we had data for all the years.

References

  1. Walker C. (2019, July) 10 years. 180 school shootings. 356 victims. Retrieved from: https://www.cnn.com/interactive/2019/07/us/ten-years-of-school-shootings-trnd/

  2. Federal Bureau of Investigation. A Study of Active Shooter Incidents in the United States Between 2000 and 2013. 2014; Available at www.fbi.gov/news/stories/2014/september (accessed January 30, 2015).

  3. Knowles H (2019, November 15) 16-year-old shooter at California high school has died, authorities say. Retrieved from:https://www.washingtonpost.com/nation/2019/11/15/year-old-santa-clarita-high-school-shooting-suspect-has-died-authorities-say/

  4. Towers, S., Gomez-Lievano, A., Khan, M., Mubayi, A., & Castillo-Chavez, C. (2015). Contagion in mass killings and school shootings. PLoS one, 10(7), e0117259.

  5. John Woodrow Cox, Steven Rich, Allyson Chiu, John Muyskens and Monica Ulmanu (Updated: 2019, November 14). More than 233,000 students have experienced gun violence at school since Columbine. Retrieved from: https://www.washingtonpost.com/graphics/2018/local/school-shootings-database/

  6. Breslau, N., Kessler, R. C., Chilcoat, H. D., Schultz, L. R., Davis, G. C., & Andreski, P. (1998). Trauma and posttraumatic stress disorder in the community: the 1996 Detroit Area Survey of Trauma. Archives of general psychiatry, 55(7), 626-632.

  7. Lowe, S. R., Blachman-Forshay, J., & Koenen, K. C. (2015). Epidemiology of trauma and trauma-related disorders: trauma as a public health issue. Evidence-based treatments for trauma-related psychological disorders: A practical guide for clinicians, 11-40.

  8. US Dept of Education. Final report and findings of the safe school initiative: Implications for the prevention of school attacks in the United States, 2004.

  9. Richardson EG, Hemenway D. Homicide, suicide, and unintentional firearm fatality: comparing the United States with other high-income countries, 2003. The Journal of Trauma and Acute Care Surgery. 2011; 70: 238–243.

  10. Hemenway D, Miller M. Firearm availability and homicide rates across 26 high-income countries. Journal of Trauma and Acute Care Surgery. 2000; 49(6): 985–988.

  11. Krug, E. G., Mercy, J. A., Dahlberg, L. L., & Powell, K. E. (1998). Firearm-and non-firearm-related homicide among children: an international comparison. Homicide Studies, 2(1), 83-95.

  12. CIA Factbook. The World Factbook; 2010. See also: http://www.ciagov/library/publications/the-world-factbook, accessed January 30, 2015.

  13. Metzl, J. M., & MacLeish, K. T. (2015). Mental illness, mass shootings, and the politics of American firearms. American journal of public health, 105(2), 240-249.

  14. Vossekuil, B., Fein, R. A., Reddy, M., Borum, R., & Modzeleski, W. (2002). The final report and findings of the safe school initiative: Implications for the prevention of school attacks in the United States. Washington, DC: U.S. Secret Service and U.S. Department of Education.

  15. Lee, J. H. (2013). School shootings in the U.S. public schools: Analysis through the eyes of an educator. Review of Higher Education and Self-Learning, 6, 88–120.

  16. Pah, A. R., Hagan, J., Jennings, A. L., Jain, A., Albrecht, K., Hockenberry, A. J., & Amaral, L. A. N. (2017). Economic insecurity and the rise in gun violence at US schools. Nature Human Behaviour, 1(2), 0040.

  17. Jetter, M., & Walker, J. (2018). The Effect of Media Coverage on Mass Shootings.

  18. Christensen, J. (2017). Why the US has the most mass shootings. CNN, published August 27, 2015.